Index in Teradata

An index is a mechanism that the SQL query optimizer can use to make table access more efficient. Indexes enhance data access by providing a more-or-less direct path to stored data to avoid performing full table scans to locate the small number of rows you typically want to retrieve or update. The Teradata Database parallel architecture makes indexing an aid to better performance, not a crutch necessary to ensure adequate performance. Full table scans are not something to be feared in the Teradata Database environment. This means that the sorts of unplanned, ad hoc queries that characterize the data warehouse process, and that often are not supported by indexes, perform very effectively for Teradata Database using full table scans.

Selectivity of Indexes
  • An index that retrieves many rows is said to have weak selectivity.
  • An index that retrieves few rows is said to be strongly selective.
The more strongly selective an index is, the more useful it is. In some cases, it is possible to link together several weakly selective nonunique secondary indexes by bit mapping them. The result is effectively a strongly selective index and a dramatic reduction in the number of table rows that must be accessed.

Row Hash and RowID
Teradata Database table rows are self-indexing with respect to their primary index and so require no additional storage space. When a row is inserted into a table, Teradata Database stores the 32-bit row hash value of the primary index with it. Because row hash values are not necessarily unique, Teradata Database also generates a unique 32-bit numeric value (called the Uniqueness Value) that it appends to the row hash value, forming a unique RowID. This RowID makes each row in a table uniquely identifiable and ensures that hash collisions do not occur.

If a table is defined with a partitioned primary index (PPI), the RowID also includes the combined partition number to which the row is assigned, where the combined partition number is derived from the partition numbers for each level of the PPI.

For Table With NoPPI
  • The first row having a specific row hash value is always assigned a uniqueness value of 1, which becomes the current high uniqueness value. Each time another row having the same row hash value is inserted, the current high value increments by 1, and that value is assigned to the row.
  • Table rows having the same row hash value are stored on disk sorted in the ascending order of RowID. 
  • Uniqueness values are not reused except for the special case in which the highest valued row within a row hash is deleted from a table.
For Table With PPI
  • The first row having a specific combined partition number and row hash value is always assigned a uniqueness value of 1, which becomes the highest current uniqueness value. Each time another row having the same combined partition number and row hash value is inserted, the current high value increments by 1, and that value is assigned to the row.
  • Table rows having the same combined partition number and row hash value are stored on disk sorted in the ascending order of RowID. 
  • Uniqueness values are not reused except for the special case in which the highest valued row within a combined partition number and row hash is deleted from a table.
Advantages of Indexes
  • The intent of indexes is to lessen the time it takes to retrieve rows from a database. The faster the retrieval, the better.
Disadvantages of Indexes
  • They must be updated every time a row is updated, deleted, or added to a table. This is only a consideration for indexes other than the primary index in the Teradata Database environment. The more indexes you have defined for a table, the bigger the potential update downside becomes. 
  • All Teradata Database secondary indexes are stored in subtables, and join and hash indexes are stored in separate tables, exerting a burden on system storage space.
  • When FALLBACK is defined for a table, a further storage space burden is created because secondary index subtables are always duplicated whenever FALLBACK is defined for a table. An additional burden on system storage space is exerted when FALLBACK is defined for join indexes or hash indexes or both.
For this reason, it is extremely important to use the EXPLAIN modifier to determine optimum data manipulation statement syntax and index usage before putting statements and indexes to work in a production environment.

Teradata Database Index Types
  • Primary index:- In general, all Teradata Database tables require a primary index because the system distributes tables on their primary indexes. 
  • Primary indexes can be: ( Unique or nonunique,Partitioned or nonpartitioned.)
  • Secondary index (Secondary indexes can be unique or nonunique)
  • Join index (JI)
  • Hash index
  • Unique Indexes
Nonunique Indexes
A nonunique index does not require its values to be unique. There are occasions when a nonunique index is the best choice as the primary index for a table. NUSIs are also very useful for many decision support situations.

Partitioned and Nonpartitioned Primary Indexes
Primary indexes can be partitioned or nonpartitioned. A nonpartitioned primary index (NPPI) is the traditional primary index by which rows are assigned to AMPs. A partitioned primary index (PPI) allows rows to be partitioned, based on some set of columns, on the AMP to which they are distributed, and ordered by the hash of the primary index columns within the partition.

A PPI can improve query performance through partition elimination. A PPI provides a useful alternative to an NPPI for executing range queries against a table, while still providing efficient access, join, and aggregation strategies on the primary index.

A multilevel PPI allows each partition at a level to be subpartitioned based on a partitioning expression, where the maximum number of levels is 15. A multilevel PPI provides multiple access paths to the rows in the base table and can improve query performance through partition elimination at each of the various levels or combination of levels.

A PPI can only be defined as unique if all the partitioning columns are included in the set of primary index columns.

Join Indexes
A join index is an indexing structure containing columns from one or more base tables and is generally used to resolve queries and eliminate the need to access and join the base tables it represents. Teradata Database join indexes can be defined in the following general ways.

  • Simple or aggregate
  • Single-table or multitable
  • Hash-ordered or value-ordered
  • Complete or sparse
Difference Between Primary Index and Primary key 
  • Primary key can not be NULL Primary Index can be NULL
  • Primary key is not mandatory in Teradata but Primary index is mandatory in Teradata.
  • Primary Key not helps in data Distribution  where as Primary Index help in data Distribution.
  • Primary Key is always unique where as Primary index Unique and Non Unique Both.
  • Primary Key is Logical Implementation where as Primary Index is Physical Implementation.
 

No comments:

Post a Comment